Piecewise Linear Multilayer Perceptrons and Dropout

نویسنده

  • Ian J. Goodfellow
چکیده

We propose a new type of hidden layer for a multilayer perceptron, and demonstrate that it obtains the best reported performance for an MLP on the MNIST dataset. 1 The piecewise linear activation function We propose to use a specific kind of piecewise linear function as the activation function for a multilayer perceptron. Specifically, suppose that the layer receives as input a vector x ∈ R. The layer then computes presynaptic output z = xW + b where W ∈ R and b ∈ R are learnable parameters of the layer. We propose to have each layer produce output via the activation function h(z)i = maxj∈Sizj where Si is a different non-empty set of indices into z for each i. This function provides several benefits: • It is similar to the rectified linear units (Glorot et al., 2011) which have already proven useful for many classification tasks. • Unlike rectifier units, every unit is guaranteed to have some of its parameters receive some training signal at each update step. This is because the inputs zj are only compared to each other, and not to 0., so one is always guaranteed to be the maximal element through which the gradient flows. In the case of rectified linear units, there is only a single element zj and it is compared against 0. In the case when 0 > zj, zj receives no update signal. Preliminary work. • Max pooling over groups of units allows the features of the network to easily become invariant to some aspects of their input. For example, if a unit hi pools (takes the max) over z1, z2, and z3, and z1, z2 and z3 respond to the same object in three different positions, then hi is invariant to these changes in the objects position. A layer consisting only of rectifier units can’t take the max over features like this; it can only take their average. • Max pooling can reduce the total number of parameters in the network. If we pool with nonoverlapping receptive fields of size k, then h has size N/k, and the next layer has its number of weight parameters reduced by a factor of k relative to if we did not use max pooling. This makes the network cheaper to train and evaluate but also more statistically efficient. • This kind of piecewise linear function can be seen as letting each unit hi learn its own activation function. Given large enough sets Si, hi can implement increasing complex convex functions of its input. This includes functions that are already used in other MLPS, such as the rectified linear function and absolute value rectification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble of Linear Experts as an Interpretable Piecewise-linear Classifier

In this study we propose a new ensemble model composed of several linear perceptrons. The objective of this study is to build a piecewise-linear classifier that is not only competitive to Multilayer Perceptrons(MLP) in generalization performance but also interpretable in the form of human-comprehensible rules. We present a simple competitive training method that allows the ensemble to effective...

متن کامل

Effect of nonlinear transformations on correlation between weighted sums in multilayer perceptrons

Nonlinear transformation is one of the major obstacles to analyzing the properties of multilayer perceptrons. In this letter, we prove that the correlation coefficient between two jointly Gaussian random variables decreases when each of them is transformed under continuous nonlinear transformations, which can be approximated by piecewise linear functions. When the inputs or the weights of a mul...

متن کامل

A performance comparison of trained multilayer perceptrons and trained classification trees

Multilayer Perceptrons and trained classification trees are two very different techniques which have recently become popular. Given enough data and time, both methods are capable of performing arbitrary nonlinear classification. We first consider the important differences between multilayer Perceptrons and classification trees and conclude that there is not enough theoretical basis for the clea...

متن کامل

Batch-normalized Maxout Network in Network

This paper reports a novel deep architecture referred to as Maxout network In Network (MIN), which can enhance model discriminability and facilitate the process of information abstraction within the receptive field. The proposed network adopts the framework of the recently developed Network In Network structure, which slides a universal approximator, multilayer perceptron (MLP) with rectifier u...

متن کامل

No bad local minima: Data independent training error guarantees for multilayer neural networks

We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local mi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1301.5088  شماره 

صفحات  -

تاریخ انتشار 2013